Multivariate time series forecasting with hierarchical structure is pervasive in real-world applications, demanding not only predicting each level of the hierarchy, but also reconciling all forecasts to ensure coherency, i.e., the forecasts should satisfy the hierarchical aggregation constraints. Moreover, the disparities of statistical characteristics between levels can be huge, worsened by non-Gaussian distributions and non-linear correlations. To this extent, we propose a novel end-to-end hierarchical time series forecasting model, based on conditioned normalizing flow-based autoregressive transformer reconciliation, to represent complex data distribution while simultaneously reconciling the forecasts to ensure coherency. Unlike other state-of-the-art methods, we achieve the forecasting and reconciliation simultaneously without requiring any explicit post-processing step. In addition, by harnessing the power of deep model, we do not rely on any assumption such as unbiased estimates or Gaussian distribution. Our evaluation experiments are conducted on four real-world hierarchical datasets from different industrial domains (three public ones and a dataset from the application servers of Alipay's data center) and the preliminary results demonstrate efficacy of our proposed method.
translated by 谷歌翻译
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations. This form of "black-box" classifier training has become increasingly important as the cost of training and inference in large-scale LMs grows. But existing black-box LM classifier learning approaches are themselves computationally inefficient, typically specializing LMs to the target task by searching in a large space of (discrete or continuous) prompts using zeroth-order optimization methods. Instead of directly optimizing in prompt space, PromptBoosting obtains a small pool of prompts via a gradient-free approach and then constructs a large pool of weak learners by pairing these prompts with different elements of the LM's output distribution. These weak learners are then ensembled using the AdaBoost algorithm. The entire learning process requires only a small number of forward passes and no backward pass. Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
translated by 谷歌翻译
Robustness evaluation against adversarial examples has become increasingly important to unveil the trustworthiness of the prevailing deep models in natural language processing (NLP). However, in contrast to the computer vision domain where the first-order projected gradient descent (PGD) is used as the benchmark approach to generate adversarial examples for robustness evaluation, there lacks a principled first-order gradient-based robustness evaluation framework in NLP. The emerging optimization challenges lie in 1) the discrete nature of textual inputs together with the strong coupling between the perturbation location and the actual content, and 2) the additional constraint that the perturbed text should be fluent and achieve a low perplexity under a language model. These challenges make the development of PGD-like NLP attacks difficult. To bridge the gap, we propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP. Specifically, we address the aforementioned challenges in a unified optimization framework. And we develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables and leverage an effective sampling method to establish an accurate mapping from the continuous optimization variables to the discrete textual perturbations. Moreover, as a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models. Extensive experiments are provided to demonstrate the effectiveness of TextGrad not only in attack generation for robustness evaluation but also in adversarial defense.
translated by 谷歌翻译
Generative models have been widely studied in computer vision. Recently, diffusion models have drawn substantial attention due to the high quality of their generated images. A key desired property of image generative models is the ability to disentangle different attributes, which should enable modification towards a style without changing the semantic content, and the modification parameters should generalize to different images. Previous studies have found that generative adversarial networks (GANs) are inherently endowed with such disentanglement capability, so they can perform disentangled image editing without re-training or fine-tuning the network. In this work, we explore whether diffusion models are also inherently equipped with such a capability. Our finding is that for stable diffusion models, by partially changing the input text embedding from a neutral description (e.g., "a photo of person") to one with style (e.g., "a photo of person with smile") while fixing all the Gaussian random noises introduced during the denoising process, the generated images can be modified towards the target style without changing the semantic content. Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. This entire process only involves optimizing over around 50 parameters and does not fine-tune the diffusion model itself. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms that require fine-tuning. The optimized weights generalize well to different images. Our code is publicly available at https://github.com/UCSB-NLP-Chang/DiffusionDisentanglement.
translated by 谷歌翻译
Low-dose computed tomography (CT) plays a significant role in reducing the radiation risk in clinical applications. However, lowering the radiation dose will significantly degrade the image quality. With the rapid development and wide application of deep learning, it has brought new directions for the development of low-dose CT imaging algorithms. Therefore, we propose a fully unsupervised one sample diffusion model (OSDM)in projection domain for low-dose CT reconstruction. To extract sufficient prior information from single sample, the Hankel matrix formulation is employed. Besides, the penalized weighted least-squares and total variation are introduced to achieve superior image quality. Specifically, we first train a score-based generative model on one sinogram by extracting a great number of tensors from the structural-Hankel matrix as the network input to capture prior distribution. Then, at the inference stage, the stochastic differential equation solver and data consistency step are performed iteratively to obtain the sinogram data. Finally, the final image is obtained through the filtered back-projection algorithm. The reconstructed results are approaching to the normal-dose counterparts. The results prove that OSDM is practical and effective model for reducing the artifacts and preserving the image quality.
translated by 谷歌翻译
Graphic User Interface (GUI) is facing great demand with the popularization and prosperity of mobile apps. Automatic UI code generation from UI design draft dramatically simplifies the development process. However, the nesting layer structure in the design draft affects the quality and usability of the generated code. Few existing GUI automated techniques detect and group the nested layers to improve the accessibility of generated code. In this paper, we proposed our UI Layers Group Detector as a vision-based method that automatically detects images (i.e., basic shapes and visual elements) and text layers that present the same semantic meanings. We propose two plug-in components, text fusion and box attention, that utilize text information from design drafts as a priori information for group localization. We construct a large-scale UI dataset for training and testing, and present a data augmentation approach to boost the detection performance. The experiment shows that the proposed method achieves a decent accuracy regarding layers grouping.
translated by 谷歌翻译
尽管促进机器学习(ML)公平的最新进展激增,但现有的主流方法主要需要培训或填充神经网络的整个权重以满足公平标准。但是,由于较大的计算和存储成本,低数据效率和模型隐私问题,对于那些大规模训练的模型来说,这通常是不可行的。在本文中,我们提出了一种称为FairreProgragr的新的通用公平学习范式,该范式结合了模型重编程技术。具体而言,Fairreprogrogram考虑了固定的神经模型,而是将输入一组扰动(称为公平触发器)附加到,该触发触发器在Min-Max公式下朝着公平标准调整为公平触发器。我们进一步介绍了一个信息理论框架,该框架解释了为什么以及在什么条件下,使用公平触发器可以实现公平目标。我们从理论和经验上都表明,公平触发器可以通过提供错误的人口统计信息来有效地掩盖固定ML模型的输出预测中的人口偏见,从而阻碍模型利用正确的人口统计信息来进行预测。对NLP和CV数据集进行的广泛实验表明,与在两个广泛使用的公平标准下,基于培训成本和数据依赖性的基于重新培训的方法相比,我们的方法可以实现更好的公平性改进。
translated by 谷歌翻译
知识图(kg)嵌入是一种主流方法,用于推理不完整的kg。但是,受其固有浅层和静态体系结构的限制,它们几乎无法处理对复杂逻辑查询的不断上升,这些查询包括逻辑运算符,估算的边缘,多个源实体和未知的中间实体。在这项工作中,我们通过掩盖的预训练和微调策略介绍了知识图变压器(kgtransformer)。我们设计了一种kg三重变换方法,以使变压器能够处理kg,这是通过稀疏(MOE)稀疏激活的混合物进一步增强的。然后,我们将复杂的逻辑查询作为掩盖预测提出,并引入了两阶段掩盖的预训练策略,以提高可转移性和概括性。在两个基准上进行的广泛实验表明,KGTRANSFORMER可以始终超过基于KG的基准和九个内域和室外推理任务的高级编码。此外,KGTRANSFORMER可以通过提供解释给定答案的完整推理路径来解释性。
translated by 谷歌翻译
构建强大的通用对象检测框架需要扩展到更大的标签空间和更大的培训数据集。但是,大规模获取数千个类别的注释是高昂的成本。我们提出了一种新颖的方法,该方法利用了最近的视觉和语言模型中可用的丰富语义来将对象定位和分类在未标记的图像中,从而有效地生成了伪标签以进行对象检测。从通用和类别的区域建议机制开始,我们使用视觉和语言模型将图像的每个区域分类为下游任务所需的任何对象类别。我们在两个特定的任务(开放式摄影检测检测)中演示了生成的伪标签的值,其中模型需要概括为看不见的对象类别以及半监督对象检测,其中可以使用其他未标记的图像来改善模型。我们的经验评估显示了伪标签在这两个任务中的有效性,我们在其中优于竞争基准并实现了开放式摄制对象检测的新颖最新。我们的代码可在https://github.com/xiaofeng94/vl-plm上找到。
translated by 谷歌翻译
该技术报告提出了一种有效的自动驾驶运动预测方法。我们开发了一种基于变压器的方法,用于输入编码和轨迹预测。此外,我们提出了时间流动头来增强轨迹编码。最后,使用了有效的K均值集合方法。使用我们的变压器网络和集合方法,我们以1.90的最新Brier-Minfde得分赢得了Argoverse 2 Motion预测挑战的第一名。
translated by 谷歌翻译